Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers

نویسندگان

  • Anshul Gupta
  • Vipin Kumar
  • Ahmed H. Sameh
چکیده

This paper analyzes the performance and scalability of an iteration of the Preconditioned Conjugate Gradient Algorithm on parallel architectures with a variety of interconnection networks, such as the mesh, the hypercube, and that of the CM-5T M † parallel computer. It is shown that for block-tridiagonal matrices resulting from two dimensional finite difference grids, the communication overhead due to vector inner products dominates the communication overheads of the remainder of the computation on a large number of processors. However, with a suitable mapping, the parallel formulation of a PCG iteration is highly scalable for such matrices on a machine like the CM-5 whose fast control network practically eliminates the overheads due to inner product computation. The use of the truncated Incomplete Cholesky (IC) preconditioner can lead to further improvement in scalability on the CM-5 by a constant factor. As a result, a parallel formulation of the PCG algorithm with IC preconditioner may execute faster than that with a simple diagonal preconditioner even if the latter runs faster in a serial implementation. For the matrices resulting from three dimensional finite difference grids, the scalability is quite good on a hypercube or the CM-5, but not as good on a 2-D mesh architecture. In case of random unstructured sparse matrices with a constant number of non-zero elements in each row, the parallel formulation of the PCG iteration is unscalable on any message passing parallel architecture, unless some ordering is applied on the sparse matrix. The parallel system can be made scalable either if, after re-ordering, the non-zero elements of the N × N matrix can be confined in a band whose width is O(N ) for any y < 1, or if the number of non-zero elements per row increases as N x for any x > 0. Scalability increases as the number of non-zero elements per row is increased and/or the width of the band containing these elements is reduced. For unstructured sparse matrices, the scalability is asymptotically the same for all architectures. Many of these analytical results are experimentally verified on the CM-5 parallel computer. ∗This work was supported by IST/SDIO through the Army Research Office grant # 28408-MA-SDI to the University of Minnesota and by the University of Minnesota Army High Performance Computing Research Center under contract # DAAL03-89-C-0038. CM-5 is a trademark of the Thinking Machines Corporation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Colorado at Denver and Health Sciences Center Preconditioned Eigensolver LOBPCG in hypre and PETSc

We present preliminary results of an ongoing project to develop codes of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems for hypre and PETSc software packages. hypre and PETSc provide high quality domain decomposition and multigrid preconditioning for parallel computers. Our LOBPCG implementation for hypre is publicly available in hy...

متن کامل

The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems

A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algori...

متن کامل

Analysis of Parallel Preconditioned Conjugate Gradient Algorithms

The conjugate gradient method is an iterative technique used to solve systems of linear equations. The paper analyzes the performance of parallel preconditioned conjugate gradient algorithms. First, a theoretical model is proposed for estimation of the complexity of PPCG method and a scalability analysis is done for three different data decomposition cases. Computational experiments are done on...

متن کامل

Vectorization of some block preconditioned conjugate gradient methods

The block preconditioned conjugate gradient methods are very effective to solve the linear systems arising from the discretization of elliptic PDE. Nevertheless, the solution of the linear system Ms = r, to get the preconditioned residual, is a 'bottleneck', on vector processors. In this paper, we show how to modify the algorithm, in order to get better performances, on such computers. Numerica...

متن کامل

A Parallel Multigrid Preconditioned Conjugate Gradient Algorithm for Groundwater Flow Simulations

This paper discusses the numerical simulation of groundwater ow through heterogeneous porous media The focus is on the performance of a parallel multigrid preconditioner for accelerating convergence of conjugate gradients which is used to compute the pressure head The numerical investigation considers the e ects of boundary conditions coarse grid solver strategy increasing the grid resolution e...

متن کامل

Deflation accelerated parallel preconditioned Conjugate Gradient method in Finite Element problems

We describe the algorithm to implement a deflation acceleration in a preconditioned Conjugate Gradient method to solve the system of linear equations from a Finite Element discretization. We focus on a parallel implementation in this paper. Subsequently we describe the data-structure. This is followed by some numerical experiments. The experiments indicate that our method is scalable.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 6  شماره 

صفحات  -

تاریخ انتشار 1995